skip to main content


Search for: All records

Creators/Authors contains: "Straszyński, Juliusz"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract

    Sequence mappability is an important task in genome resequencing. In the (km)-mappability problem, for a given sequenceTof lengthn, the goal is to compute a table whoseith entry is the number of indices$$j \ne i$$jisuch that the length-msubstrings ofTstarting at positionsiandjhave at mostkmismatches. Previous works on this problem focused on heuristics computing a rough approximation of the result or on the case of$$k=1$$k=1. We present several efficient algorithms for the general case of the problem. Our main result is an algorithm that, for$$k=O(1)$$k=O(1), works in$$O(n)$$O(n)space and, with high probability, in$$O(n \cdot \min \{m^k,\log ^k n\})$$O(n·min{mk,logkn})time. Our algorithm requires a careful adaptation of thek-errata trees of Cole et al. [STOC 2004] to avoid multiple counting of pairs of substrings. Our technique can also be applied to solve the all-pairs Hamming distance problem introduced by Crochemore et al. [WABI 2017]. We further develop$$O(n^2)$$O(n2)-time algorithms to computeall(km)-mappability tables for a fixedmand all$$k\in \{0,\ldots ,m\}$$k{0,,m}or a fixedkand all$$m\in \{k,\ldots ,n\}$$m{k,,n}. Finally, we show that, for$$k,m = \Theta (\log n)$$k,m=Θ(logn), the (km)-mappability problem cannot be solved in strongly subquadratic time unless the Strong Exponential Time Hypothesis fails. This is an improved and extended version of a paper presented at SPIRE 2018.

     
    more » « less